Accelerate Distributed Learning Related

Accelerate Distributed Learning all about

RDMA, ps-lite, Distribute training, Parameter Server and Ring All-reduce

  • presenter Jingrong Chen
  • time: 17th.08.2018

What i learn

1. RDMA

2.IRN (Revisiting Network Support for RDMA)

3.RoCEv2 + PFC -> DCQCN

4.iWARP

5.Distributing Training

  • Data distributed
  • Model distributed

6.Communication Structure

  • Parameter Server (Tensorflow)
  • Ring All-reduce

7.Parameter Server

Nature: KVStore
Benefit:

  • Asynchronized Update
  • make fault tolerance easily

8.Ring All-reduce

Benefit:
Disadvantage:

  • No fault tolerance
  • Not suitable for cloud
    Implementations:
  • Tensorflow + Uber Horovod
  • Baidu ring-allreduce(not available)

9. ps-lite

MXNet and ps-lite are decoupled, which means:

  • No memory management in ps-lite
  • No assumption on tensor size -> need rendezvous mode
  • 1 vs. N communication

10. Programming on Verbs

Memory must be registered before use -> manage memory manually

Work completion handler cannot block the CQ polling thread -> thread poll / coroutine

Number of outstanding SR cannot exceed the SQ size, as well as number of outstanding RR on the remote side -> flow control

Small and large message -> Eager mode & Rendezvous mode


For more


This is Yiqing Ma ‘s website.


If life deals you lemons, make lemonade….